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Abstract — Effective complexity measures the information con- 
tent of the regularities of an object. It has been introduced by M. 
Gell-Mann and S. Lloyd to avoid some of the disadvantages of 
Kolmogorov complexity, also known as algorithmic information 
content. In this paper, we give a precise formal definition of 
effective complexity and rigorous proofs of its basic properties. 
In particular, we show that incompressible binary strings are 
effectively simple, and we prove the existence of strings that have 
effective complexity close to their lengths. Furthermore, we show 
that effective complexity is related to Bennett's logical depth: If 
the effective complexity of a string x exceeds a certain explicit 
threshold then that string must have astronomically large depth; 
otherwise, the depth can be arbitrarily small. 

Index Terms — Effective Complexity, Kolmogorov Complexity, 
Algorithmic Information Content, Bennett's Logical Depth, Kol- 
mogorov Minimal Sufficient Statistics, Shannon Entropy. 

I. Introduction and Main Results 

WHAT is complexity? A great deal of research has been 
performed on the question in what sense some objects 
are "more complicated" than others, and how this fact and its 
consequences can be analyzed mathematically. 

One of the most well-known complexity measures is Kol- 
mogorov complexity [ll, also called algorithmic complexity 
or algorithmic information content. In short, the Kolmogorov 
complexity of some finite binary string x is the length of the 
shortest computer program that produces a; on a universal com- 
puter. So Kolmogorov complexity quantifies how well a string 
can in principle be compressed. This notion of complexity 
has found various interesting applications in mathematics and 
computer science. 

Despite its usefulness, Kolmogorov complexity does not 
capture the intuitive notion of complexity very well. For 
example, random strings without any regularities, say strings 
that are constructed bitwise by repeated tosses of a fair coin, 
have very large Kolmogorov complexity. But those strings 
are not "complex" from an intuitive point of view — those 
strings are completely random and do not carry any interesting 
structure at all. 

Effective complexity is an attempt by M. Gell-Mann and 
S. Lloyd ill,!?] to define some complexity measure that is 
closer to the intuitive notion of complexity and overcomes 
the difficulties of Kolmogorov complexity. The main idea of 
effective complexity is to split the algorithmic information 
content of some string x into two parts, its random features and 
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its regularities. Then, the effective complexity of x is defined 
as the algorithmic information content of the regularities alone. 

In this paper, we are interested in the basic properties of 
effective complexity, and how it relates to other complexity 
measures. In particular, we give a more precise formal defi- 
nition of effective complexity than has been done previously. 
We use this formal framework to give detailed proofs of the 
properties of effective complexity, and we use it to show an un- 
expected relation between effective complexity and Bennett's 
logical depth f?!. 

Since there are now so many different complexity mea- 
sures ||5], our result contributes to the clarification of the 
interrelations within this "zoo" of complexity measures. More- 
over, we hope that our more formal approach helps to find 
applications of effective complexity within mathematics, in 
a similar manner as this has been done for Kolmogorov 
complexity. 

We now describe our main results and give a synopsis of 
this paper: 

« After some notational preliminaries in Section HI] we mo- 
tivate and state the main definition of effective complexity 
in Section Hill 

• In Section HVl we analyze the basic properties of effective 
complexity. In particular, we show in Theorem [TO] that 
effective complexity indeed avoids the disadvantage of 
Kolmogorov complexity that we have explained above: 
Random strings are effectively simple. 

Although the existence of effectively complex strings 
has been mentioned in ||2l, it has not been conjectured 
explicitly. Based on the notion of algorithmic statistics as 
studied by Gacs at al. |6| we provide a formal existence 
proof, see Theorem [l4l 

• Section [V] contains our main result (Theorem [18] and 
Theorem [T9b. the relation between effective complexity 
and logical depth. In short, it states that if the effective 
complexity of some string exceeds a certain explicit 
threshold, then the time it takes to compute that string 
from a short description must be astronomically large. 
This threshold is in some sense very sharp, such that 
the behavior of logical depth with respect to effective 
complexity is comparable to that of a phase transition 
(cf. Fig.|2]on page [TTli. 

• In Section [Vl] we show how effective complexity is 
related to the notion of Kolmogorov minimal sufficient 
statistics. 

• Finally, in the Appendix, we give an explicit example 
of a computable ensemble on the binary strings that has 
non-computable entropy. This illustrates the necessity of 
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the details of our definition in Section |III] 
We start by introducing notation. 

II. Preliminaries and Notation 

We denote the finite binary strings {A, 0, 1, 00, 01, . . .} by 
{0, 1}*, where A is the empty string, and we write £{x) for 
the length of a binary string x e {0, 1}*. An ensemble E is a 
probability distribution on {0, 1}*. All logarithms are in base 
2. 

We assume that the reader is familiar with the basic concepts 
of Kolmogorov complexity; a good reference is the book by 
Li and Vitanyi 1 1 1. There is a "plain" and a "prefix" version of 
Kolmogorov complexity, and we will use both of them in this 
paper The plain Kolmogorov complexity C {x) of some string 
X is defined as the length of the shortest computer program 
that outputs X if it is given as input to a universal computer 
V, 

C{x) := min{£(p) | V{p) = x]. 

Prefix Kolmogorov complexity is defined analogously, but 
with respect to a universal prefix computer U . A prefix 
computer U has the property that if U{s) is defined for some 
string s, then U{st) is undefined for every string t that is not 
the empty string. So 

K{x) min{^(p) | U{p) = x}. 

There are different possible choices of U and V\ we fix one 
of them for the rest of the paper. 

Several variations of Kolmogorov complexity can be easily 
defined and will be used in this paper, for example, the 
complexity of a finite list of strings, or the complexity of an 
integer or a real number. With a few exceptions below, we 
will not discuss the details of the definitions here and instead 
refer the reader to Ref. H]. 

The first exception that deserves a more detailed discussion 
is conditional complexity. There are two versions of condi- 
tional complexity, a "naive" one and a more sophisticated one. 
The naive definition is 

:=min{^(p) |pe {0,1}*, U[p,v)^x}, (1) 

that is, the complexity of producing string x, given string y 
as additional "free" information. A more sophisticated version 
due to Chaitin Q reads 

KMy) min{^(p) | p £ {0,1}*, U{p,v*) = x}, (2) 

that is, the complexity of producing x, given a minimal 
program y* for y. The advantage of K^,{-\-) compared with 
K{-\-) is the validity of a chain rule 

K{x,y)^K[y) + KMv) (3) 

for all strings x and y. Here we make use of a well- 
known notation IS) which helps to suppress additive constants: 
Suppose that f. g : {0, 1}* N are functions on the binary 
strings, and there is some c G N independent of the argument 
value s such that /(s) < g{s) + c for every s G {0, 1}*, i.e. 
the inequality holds uniformly for s G {0, 1}*. Then we write 

f{s)<g{s) (sG{0,l}*). 



2 

We use the notation 3= if both < and > hold. 

Note that the "naive" form of conditional complexity as 
defined in ([T]) does not satisfy the chain rule (O. Only the 
weaker identity 

K{x,y)<K{y) + K{x\y) (4) 

holds in general. 

We will often use obvious identities like K{x) < K{x, y) 
or K{x, y) — K{y, x) without explaining in detail where they 
come from; we again refer the reader to the book by Li and 
Vitanyi [IJ. 

Another important prerequisite for this paper is the defi- 
nition of the prefix Kolmogorov complexity K{K) of some 
ensemble E. In contrast to bit strings, there are several 
inequivalent notions of a "description", and we can learn from 
Ref. 16J the lesson that it is very important to exactly specify 
which of them we will use. 

Our definition of K{E} for ensembles E is as follows. First, 
a program that computes E is a computer program that expects 
two inputs, namely a string s G {0, 1}* and an integer rt G N, 
and that outputs (the binary digits of) an approximation of 
E(s) with accuracy of at least 2^". Then, our preliminary 
definition of K{E) is the length of the shortest program for 
the universal prefix computer U that computes E. 

Obviously, not every ensemble E is computable — there is 
a continuum of string ensembles, but there are only countably 
many algorithms that compute ensembles. Another unexpected 
difficulty concerns the entropy of a computable ensemble, 
defined as 77(E) := —J2xe{Q i}* E(a;) logE(a::). Contrary to 
a first naive guess, the entropy of a computable ensemble does 
not need to be computable; all we know for sure is that it is 
enumerable from below. To illustrate this, we give an explicit 
example of a computable ensemble with a non-computable 
entropy in Example |22] in Appendix [A] 

Thus, for the rest of the paper, we assume that all 
ensembles are computable and have computable and finite 
entropy H{E), unless stated otherwise. 

Even when one restricts to the set of ensembles with 
computable entropy, the map E ^ iJ(E) is not necessarily 
a computable function. Hence the approximate equality 

A'(E,i?(E)) = K(E) 

is not necessarily true uniformly in E. Thus, from now on we 
replace the preliminary definition K{E) by 

K{E) := K{E,H{E)), 

i.e. we assume that computer programs for ensembles 
E carry additionally a subprogram that computes the 
entropy H{E). 

III. Definition of Effective Complexity 

To define the notion of effective complexity, we follow the 
steps described in one of the original manuscripts by M. Gell- 
Mann and S. Lloyd |3J. First, they define the total information 
of an ensemble as the sum of the ensemble's entropy and 
complexity. 
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To understand the motivation behind this definition, suppose 
we are given some data x (a finite binary string) which has 
been generated by an unknown stochastic process. We would 
Hke to make a good guess on the process that generated x, 
even if we only have one sample of the process. This is 
similar to a scientist that tries to find a (probabilistic) theory of 
physics, given only the present state of the universe. To make 
a good guess on the probability distribution or ensemble E 
that produced x, we make two natural assumptions: 

• The explanation should be simple. In terms of Kol- 
mogorov complexity, this means that K{K) should be 
small. 

• The explanation should not allow all possible outcomes, 
but should prefer some outcomes (including x) over 
others. For example, the uniform distribution on a billion 
different possible physical theories is "simple" (i.e. K{K) 
is small), but it is not a "good explanation" of our physical 
world because it contains a huge amount of arbitrariness. 
This arbitrariness can be identified with the "measure of 
ignorance", the entropy of E. Thus, it is natural to demand 
that the entropy H{K) shall be small. 

Putting both assumptions together, it is natural to consider the 
sum K{E) + i7(E) which is called the "total information" 
S(E). A "good theory" is then an ensemble E with small 
S(E). 

Definition 1 (Total Information): 
For every ensemble E with entropy -ff(E) 
— Y^^i^^Q ^,'&{x)\og'&{x), we define the total information 



EfE) of] 



I](E) := /V (E) + H{1 



Note that the total information is a real number larger than 
or equal to 1. If E is computable and has finite entropy, as 
always assumed in this paper, then I](E) is finite. 

In the subsequent work |2| by M. Gell-Mann and S. Lloyd, 
it has been pointed out that H{E) k, Yjs&{o i}* E(s)iC(s|E). 
It follows that 

I](E) « ¥.{s){K{s\¥.) + K{¥)) . (5) 

se{04}' 

This has a nice interpretation: The total information gives the 
average complexity of computing a string with the detour of 
computing the ensemble. 

The next step in |3 | is to explain what is meant by a string 
being "typical" for an ensemble. Going back to the analogy 
of a scientist trying to find a theory E explaining his data x, 
a good theory should in fact predict that the appearance of 
X has non-zero probability. Even more, the probability E(x) 
should not be too small; it should be at least as large as that 
of "typical" outcomes of the corresponding process. 

What is the probability of a "typical" outcome of a random 
experiment? Suppose we toss a biased coin with probability 
p for heads and 1 — p =: q for tails n times, and call the 
resulting probability distribution E. Then it turns out that 
typical outcomes x have probability E(x) close to 2^"^, 
where H :— —p\ogp — qlogq, and n ■ H is the entropy of 
E. In fact, the probability that E(a;) hes in between 2~"(^+^' 
and 2^"(^+^) for e > tends to one as n gets large. In 



information theory, this is called the "asymptotic equipartition 
property" (cf. Ref. ||8]). An appropriately extended version 
of this result holds for a large class of stochastic processes, 
including ergodic processes. 

This motivates to define that a string x is typical for an 
ensemble E if its probability is not much smaller than 2^^^*'). 

Definition 2 (S-Typical String): 
Let E be an ensemble, x e {0, 1}* a string and (5 > 0. We 
say that x is 6-typical for E, if 

We return to the scenario of the scientist who looks for good 
theories (ensembles E) explaining his data x. As discussed 
above, it is natural to look for theories with small total 
information S(E). Moreover, the theory should predict x as 
a "typical" outcome of the corresponding random experiment, 
that is, X should be (5-typical for E for some small constant 6. 

How small can the total information of such a theory be? 
The next lemma shows that the answer is "not too small". 

Lemma 3 (Minimal Total Information): 
It uniformly holds for x e {0, 1}* and 5 > that 

< inf{I](E) I X is (5-typical for E} < K{x). 

1 + 

Remark. The upper bound K{x) and the computabiUty of 
E show that the set is finite, and the infimum is indeed a 
minimum. 

Proof: Fix some & >{) and some x G {0, 1}*. Clearly, x 
is (5-typical for the singlet distribution E^;, given by ^x{x) = 1 
and Ej:(a;') = for every x' ^ x. This ensemble has entropy 
H{Kx) = 0. Thus, the total information equals the 

complexity K{&x)- We also have 

A-(E,) ± K{x), 

as describing the ensemble E^^ boils down to describing the 
string X. Furthermore, the corresponding additive constant 

does not depend on x or 5. It follows that inf{I](E)} < K{x). 

To prove the converse, suppose E is any ensemble such that 
X is (5-typical for E. Then we have the chain of inequalities 

K{x) < K{x,^) 

< K{^) + K{x\&) 

< K{K)+\-\ogE{x)'\ 

< K(E)+\H{E){l + d)] (6) 

< E(E) + SH{E) + 1 (7) 

< S(E)(l + (5) + l. (8) 

The first two inequalities follow from general properties of 
prefix Kolmogorov complexity, while the third inequality is 
due to the upper bound 



K{x\E) < [-logE(a;)], 



(9) 



which follows from coding every string x with E(a;) / into 
a prefix code word of length [— logE(a:;)] (such a code exists 
due to the Kraft inequality). Moreover, (|6]) is a consequence 
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of (5-typicality of s for E, and O uses the definition of the 
total information S. □ 

The ultimate goal of effective complexity is to assign a 
useful complexity measure £{x) to strings x. In our analogy, 
this means that the scientist wants to assign a natural number 
to his data x saying how "complex" x is. Simply taking the 
Kolmogorov complexity K{x) as this value has important 
drawbacks: It does not at all capture the intuition that "com- 
plexity" should measure the "amount of structure" of an object. 
In fact, if X is uniformly random (i.e. the result of fair coin 
tossing), then K{x) is large, while the string possesses almost 
no structure at all. 

The strategy of S. Lloyd and M. Gell-Mann |3l is instead to 
take that complexity ^(E) of "the best" theory E that explains 
the data x. What is "the best" theory? As already discussed, 
a good theory should have small total information I](E), and 
the data x should be "typical" for E in the sense that the 
probability E(.x) is not much smaller than 2^-^'-^\ 

Given some data x, there are always many "good theories" 
which satisfy these requirements. Which one is "the best"? 
To think about this question, it is helpful to look at a 
graphical representation of "good theories" and their properties 
as described in O and depicted in Fig. [l] 

K{E) 




K{x) 

Fig. 1. The minimization domain of effective complexity. Plotted are only 
those ensembles E for which the fixed string x is typical. 

Suppose we plot the set of theories in the entropy- 
complexity plane. That is, for every computable ensemble E 
(with finite and computable entropy), we plot a black dot at 
the plane, where the x-axis labels the entropy _ff (E) and the 
y-axis labels the Kolmogorov complexity A'(E). 

The Kolmogorov complexity K{K) is integer- valued, and if 
n G N is small, there are only few ensembles E with A'(E) = 
n (in fact, the number of such ensembles is upper-bounded 
by 2"). Thus, there are only few black dots at small values of 
the y-axis. Going up the y-axis, the number of ensembles and 
hence the density of the black dots increases. 

The total information I](E) is the sum of the ensemble's 
entropy and complexity. Thus, ensembles with constant total 
information correspond to lines in the plane that are parallel 
to the tilted line in Fig. [T] 



Suppose we fix some data x and plot only those ensembles 
E such that x is (5-typical for E for some fixed constant (5 > 0. 
This was one of our two requirements that a "good theory" E 
for X should fulfill. That is, we dismiss all the ensembles for 
which X is not a typical realization. 

Then Lemma [3] tells us that all the remaining ensembles 
must, up to an additive constant, have total information larger 
than K{x)/{1 + 5). Graphically, this means that all those 
ensembles must approximately lie right of the straight line with 
-ff (E) + K(K) = K{x). One of these ensembles is the Dirac 
measure 5x, the ensemble with 5x{x) = 1 and 5x{x') = 
for X 7^ x': It has Kolmogorov complexity K{5x) — K{x) 
and entropy H{6x) = 0, hence minimal total information. It 
corresponds to the circle on the y-axis at the left end of the 
line. 

We also have discussed a second requirement for a "good 
theory": The total information should be as small as possible. 
According to Lemma [3] this means that S(E) should not 
be much larger than the Kolmogorov complexity K{x). We 
identify the "good" theories as those ensembles that are not 
too far away from the line in Fig. [T] say, we consider those 
ensembles as "good" that are below the dotted line with 
S(E) =K{x) + A. 

Among the remaining good theories, which one is "the 
best"? The convincing suggestion by M. Gell-Mann and S. 
Lloyd is that the best theory is the simplest theory; that is, the 
ensemble E with the minimal Kolmogorov complexity K{E). 
The complexity K{E,) of this minimizing ensemble is then 
defined as the effective complexity of x. 

In other words, the effective complexity of x is defined as 
the smallest possible Kolmogorov complexity of any "good 
theory" (satisfying the two requirements) for x. This suggests 
the following preliminary definition (we discuss an important 
modification below): 

Definition 4 (Effective Complexity I): 
Given parameters 6,A > 0, the effective complexity £s,a{x) 
of any string x e {0,1}* is defined as 

£s,a{x) := inf{_ftr(E) | x is (5-typical for E, 

I](E) < K{x)+A}, 

or as oo if this set is empty. 

We refer to the set on the right-hand side as the minimization 
domain of x for effective complexity, and denote it by 
'Ps,a{x). Thus 

£sa(x)= min K(E). 

Note that ensembles E of the minimization domain Vs.a{x) 
of X e {0, 1}* satisfy 

^<S(E)<i^(.) + A. 

This notion of effective complexity is closely related to a 
quantity called "Kolmogorov minimal sufficient statistics". We 
explain this fact in more detail in Definition |20] and Lemma 1211 
in Section FVll below. 

As pointed out by M. Gell-Mann and S. Lloyd, it is often 
useful to extend this definition of effective complexity by 
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imposing additional conditions ("constraints") on the ensem- 
bles that are allowed in the minimization domain. There 
are basically two intuitive reasons why this is useful. To 
understand those reasons, we go back to the scenario of a 
scientist looking for good theories to explain his data x. Recall 
the interpretation of the minimization domain of effective 
complexity as the set of "good theories" for x. Reasons for 
considering constraints on the ensembles are: 

« Given the string x, there might be certain properties of x 
that the scientist judges to be important. Those properties 
should be explained by the theories in the sense that the 
properties are not just simple random coincidences, but 
necessary or highly probable properties of each outcome 
of the corresponding process. 

For example, suppose that a scientist wants to find good 
theories that explain the present state of our universe. In 
addition, that scientist finds it particularly important and 
interesting that the value of the fine structure constant is 
about and would like to find theories that explain 
why this constant is close to that value. Then, he will 
only accept ensembles E that have expectation value of 
this constant not too far away from 
In terms of effective complexity, this scientist views the 
appearance of a fine structure constant of about as 
an important structural (non-random) property of x, the 
encoded state of our universe. Thus, he considers this 
property as part of the regularities of x. Effective com- 
plexity is the Kolmogorov complexity of the regularities 
of X, thus, this scientist tends to find a larger value of 
effective complexity than other scientists who consider 
the fine structure constant as unimportant and random. 
• The scientist might have additional information on the 
process that actually created x. This situation is often 
encountered in thermodynamics. Suppose that x encodes 
some microscopic properties of a gas in a container that a 
scientist has measured. In addition to these measurement 
results, the scientist typically also has information on 
several macroscopic observables like the temperature or 
the total energy in the box — at least, crude upper bounds 
are usually given by basic properties of the laboratory 
physics. Then, a "good theory" consistent with the actual 
physical process within the lab must obey the additional 
constraints given by the macroscopic observables. 
In terms of effective complexity, the macroscopic observ- 
ables respectively the additional information contributes 
to the regularities of x and enlarges its effective complex- 
ity. 

Definition 5 (Effective Complexity II): 
Given parameters 6,A>0, the effective complexity £s,a{x\C) 
of a string x constrained to a subset C of all ensembles is 

£s^a{x\C) := M{K{E) \ x is ^-typical for E, E e C, 

J:{E) < K{x) + A} (10) 

or as oo if this set is empty. The set on the right-hand side is 
called the constrained minimization domain of x for effective 
complexity and equals 'Ps,a{x) n C according to the notation 

of Definition H Thus, £s^a{x\C) = mmEeVs,A{x)nc K{¥.). 



Note that we allow that C depends on x. This makes it possible, 
for example, to introduce the constraint that x is an element 
of an ensemble of strings that all have the same length: Just 
take C as the set of probability distributions on {0, 1}", where 

n — £{x). 

In general restricting the set of ensembles wiU increase the 
value of effective complexity, i.e. 

C C I? ^ £s,a(x\C) > SsAi^lT^)- 

This is in agreement with our intuition. Indeed such restric- 
tions give a way to demand explicitly that some regularities of 
the considered string x appear as a consequence of regularities 
of the generating process. As such, they contribute to the 
effective complexity. 

If the constrained set C or the constant A are too small 
such that the (constrained) minimization domain Vs,a{x) HC 
is empty, then effective complexity is infinite, according to 
Definition |5] Furthermore note that 

• £s,a{x\C) is decreasing in 6 and A, 

• if £s,a{x\C) is finite and x G {0, 1}", then 

SsMMQ < K{x) + A< n + Oilogn). (11) 

In many situations in physics, the constrained sets C appearing 
in the definition of £5a{x\C) consist of those ensembles that 
have expectation values of observables within certain intervals. 
That is, we have real-valued functions fi, the observables, and 
the set of ensembles C consists of those ensembles E with 

xe{o,i}' 

where the sets C K are intervals or possibly fixed real 
numbers. Sometimes it even makes sense to allow different 
intervals /^(E) for different ensembles E; say, the intervals 
may all be centered around the same fixed expectation value, 
but may have a width which grows with the standard deviation 
of E with respect to the observable fi. 

This is explored in more detail in the following example. 

Example 6 (Constraints and Observables): 
Fix some string x G {0, 1}*. Let M be an index set 
and {fi}ieM a family of real-valued constraint functions 
on {0, 1}* (the observables). We would like to define a 
constrained set of ensembles Cx with the following property: 
Cx shall contain all those ensembles which have expectation 
values of the observables {fi} that are "not too far away from" 
the actual values fi{x) of the observables evaluated on the 
string X. This is done in the following way: 

To each observable fi and ensemble E, associate a corre- 
sponding interval /^(E) C K. The choice of those intervals is 
somewhat arbitrary - we only demand that they contain the 
expectation values of the corresponding observables, i.e. 

E{s)f,{s) e /»(E) for all i e M. 

se{o,i}* 

Then define the constrained set of ensembles Cx by 

Cx := {E I f,{x) e /.(E) for all i e M}, 
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that is, Cx consists of those ensembles E such that the 
corresponding interval (centered around the corresponding ex- 
pectation value) contains the "correct" value of the observable 
evaluated at x. 

The corresponding effective complexity value 

has a natural interpretation as the effective complexity of x if 
the observable properties fi of x are judged to be important (or 
are fixed as macroscopic observables). Compare the discussion 
before Definition |5] 

To illustrate the notation introduced in the previous example, 
we look at the situation when we would like to define the 
effective complexity of strings under the constraint of fixed 
length. That is, suppose that we consider the length £{x) of our 
string X as an important regularity - or that we have additional 
side information that the unknown random process generates 
strings of fixed length only. In this case, it makes sense to 
look at the effective complexity £s,a{x\Cx), where 



Cx {E I £{y) ^ £{x) ^ E(y) = 0}. 



(12) 



Example 7 (Fixed Length Constraint): 
Consider the effective complexity notion Es^a{x\Cx) as ex- 
plained above. Instead of using Equation ( fT2] i. we can also 
use the notation of Example |6l we have only one constraint, 
so the index set M satisfies ~ 1, for example M = {!}. 

Our observable /i is then the characteristic function on the 
strings of length l{x), that is. 



1 if £{s) ^ 
otherwise. 



To every ensemble E, we associate an interval /i(E) which 
only consists of the single real number that equals the corre- 
sponding expectation value of /i, i.e. 

/i(E) = I ]E(.)/i(,s) i = I IE(s)icK. 

Ug{0,1}* J yi(s)=t(x) J 

It is then easy to see that the set Cx defined in Example |6] 
above equals the set in Equation (fTzl l. 

Due to linearity, the constrained sets Cx in Example |6] are 
always convex. This property, together with a computability 
condition, will be useful in the following. 

Definition 8 (Convex and Decidable Constraints): A set C 
of ensembles on the binary strings is called 

• convex, if for every finite set of ensembles C C, 
every computable convex combination A^E^ with Ai G 
(0, 1) and ^ • Ai = 1, is also in C, 

• decidable, if there exists some algorithm that, given some 
string X e {0, 1}* as input, decides in finite time whether 
the Dirac measure on x is an element of C or not, that 
is, whether the measure 



5x{t) 



1 ift = x 
if < 7^ x, 



satisfies 5x e C or not. In this sense, we may define 
K{C) as the length of the shortest computer program that 
computes the corresponding decision function. 



We proceed by analyzing some basic properties of effective 
complexity. 

IV. Basic Properties of Effective Complexity 

We have remarked that the effective complexity £s.a{x\C) 
can be infinite, for example, if the constant A is too small or 
the constrained set of ensembles C is too restrictive such that 
the minimization domain satisfies Vs.a{x) flC = 0. Thus, we 
start by proving a simple sufficient condition that guarantees 
that effective complexity is finite. 

Lemma 9 (Finiteness of Effective Complexity): There is a 
constant to G N such that 

SsAi^l'^) < ^i^) + A < oo 

for all strings x with Dirac measure Sx E C, 6 > and A > to. 

Proof: Due to ( fTTl i. we only have to prove that £s.a{x\C) 
is finite. According to Definition |5] it remains to prove that 

n-,A(x)nc^0 

under the conditions given above, where 'Ps,a{x) is the mini- 
mization domain of x. To this end, we show that 6x G 'Ps,a{x)- 
This follows from 

. 6x{x) = 1 = 2-^('^'-'(i+''' for every (5 > 0, so a; is 

5- typical for Sx, 
, E(4) = H{dx) + K{5x) = K{5x) < K{x) + m, where 

m G N is a constant. □ 

Our first result resembles the example on p. 51 in 13 |. Sup- 
pose that we have a random binary string x of length n, maybe 
a string which has been determined by tossing a fair coin n 
times. The Kolmogorov complexity of such a string typically 
satisfies K{x) « n, that is, structureless random strings have 
maximal Kolmogorov complexity. This was one of reasons 
for S. Lloyd's and M. Gell-Mann's criticism of Kolmogorov 
complexity and for their attempt to define effective complexity 
as a more useful and intuitive complexity measure. 

The following theorem proves that random strings indeed 
have small effective complexity, which supports the point of 
view that effective complexity measures only the complexity 
of the non-random structure of a string. Before we state that 
theorem, we have to explain in detail what we mean by a 
"random" string. 

It is well-known that most strings are incompressible, which 
is what we mean by "random" at this point. In more detail, 
if r G N is some fixed parameter, the number of strings x of 
length n with prefix complexity K{x) < n+K{n) — r does not 
exceed 2"-'~+'^(i) (cf. JT] Thm. 3.3.1]). That is, most stiings 
are incompressible in the sense that K{x) > n + K{n) — r. 
We call such strings r-incompressible. 

Theorem 10 (Incompressible Strings are Simple): There 
exists some global constant c G N such that 



£s,a(^) ^ logn + C'(loglogn) 



(13) 



for all r-incompressible strings x of length n, 6 > Q and 

A > r + c. 
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Moreover, if C is a convex and decidable constrained set of 
ensembles, then for all r-incompressible strings x of length n 
with the property that the Dirac measure 6x G C, we have 

SsM^\C) <logn + 0{\og\ogn) + K{C) 

whenever S > Q and A > r+c+K{C). Note that the log log n- 
term does not depend on C. 

Proof: With a suitable choice of c, the first part of the 
theorem is a special case of the second part (with C .^the set 
of all ensembles), so it is sufficient to prove the second part. 

Suppose that C and x satisfy the conditions of the theorem. 
Let Ex\c be the uniform distribution on 

M,|c:={<e{0,l}* \m=£{x), SteC}. 

Then £f(E,|c) = log#M,|c < i{x), and E,|c(a;) = 
^j^j ~ 2^-^^^''\'^\ so X is 5-typical for E^lc- Moreover, 

KiEx\c)<K{e{x))+K{C). 

Thus, i;(E^|c) < iix) + K{i{x)) + K{C). The strings x 
which are r-incompressible satisfy by definition K{x) > 
£{x)+K{£{x))~r. Denoting by r{x) the corresponding degree 
of incompressibility of x gives 

^{K,\c)<Kix)+rix)+K{C), 

i.e. there is a global constant c G N such that I](Ej.|c) < 
K{x) + r{x) + K{C) + c. Now if A > r{x) + K{C) + c, it 
follows from Definition |4] and Ej.|c G C that 

EsaW < K{Ex\c) <Ki£{x)) + KiC) + 0{l) 

< \og£{x) + 0{loglog£{x)) + K{C). □ 

Note that according to the theorem above, every string x G 
{0, 1}* becomes effectively simple if A is large enough. 
Indeed, for every 5 > 0, relation (fTSl ) is satisfied by x 
if A is larger than the a;-dependent threshold AmaxC^;) := 
£{x) + log^(a:) + c — K{x). (Here c is the global constant 
appearing in Theorem [Tol) 

For strings x of fixed length n, one can give an n-dependent 

threshold Amax(n-) := n + c such that £s,a{^) < logn + 
O(loglogri) if A > Ai„ax('^)- On the other hand, due to 
Lemma|9] to ensure £s,a{^) < ^ foi" ^ ^ {0: 1}*' oi^^ should 
choose A not too small, namely A > m, where m was a global 
constant depending on the reference universal computer only. 

These simple observations show that a discussion of the 
dependence of effective complexity for arbitrary but fixed 
strings x G {0, 1}* on the parameter A should be useful for 
a deeper understanding of the concept. 

In a forthcoming paper ||9|, we investigate in more detail the 
behavior of effective complexity of long strings generated by 
stochastic processes for different choices of A = A(?i). For 
the rest of this paper, we assume A to be a fixed constant (not 
depending on n or x) that is larger than the aforementioned 
constant to. 

In what follows we strengthen the result in Theorem [Tol in 
a way that will be interesting later in Section [Vl 



Corollary 11: There exists some global constant c G N 
such that uniformly 

SsM^) < K{Cix))+r 

for all r-incompressible strings x, 6 >0 and A > r + c. 

If C is a convex and decidable set of ensembles, then for 
all x that additionally satisfy C we have 

£s,Aix\C) < K{C{x)) + r + K{C) 

whenever A > r + c + K{C). 

Proof: It follows from the proof of Theorem [TO] that 

EsAiAC) < K{£{x)) + K{C). 
According to the definition of r-incompressibihty, we also 
have K{£{x)) < K{x) - £{x) + r, thus SsAi^l^) < K{x) - 
£{x) + r + K{C). Moreover, it holds Q K{x) < C{x) + 
K{C{x)) and C{x) < £{x). □ 

The fact that incompressible strings have small effective 
complexity — and most strings are incompressible — raises 
the question if there exists any string with large effective 
complexity at all. Fortunately, the answer is "yes"; otherwise, 
the notion of effective complexity would be an empty concept. 
On the one hand, we might drop a requirement posed in 
Theorem [TO] There, we restricted to constrained sets C of 
ensembles that contain the Dirac measure S^, which basically 
means that the string x should itself fulfill the constraints that 
are used to define C. The effective complexity of strings x that 
do not meet this requirement might possibly be large. 

On the other hand, even among strings that fulfill this 
requirement, there are still strings that are effectively complex, 
namely those strings that are called "non-stochastic" in the 
context of algorithmic and Kolmogorov minimal sufficient 
statistics |l6l. Suppose we have a finite subset S C {0, 1}* 
of the finite binary strings. There are elements a; G 5 that 
are easy to specify, once the set S is given, in the sense that 
K{x\S) is small. For example, the smallest element of S in 
lexicographical order has very small conditional complexity 
K{x\S). We call such elements atypical. On the other hand, 
most of the elements of S will not be special in such a way, 
such that we can specify them only by giving their "index" 
within the set S, which takes about log #5 bits. Thus, most 
elements x G S will have 

K{x\S)>\og#S. 

We can call such elements typical for S. There is a lemma by 
Gacs, Tromp, and Vitanyi |6|, stating that there exist strings 
which are atypical for every simple set S. They are called 
non-stochastic: 

Lemma 12 ([6. Thm.IV.2]): There are constants ci,C2 G N 
such that the following holds true: Suppose n G N is fixed. For 
every k < n, there is some string x G {0, 1}" with K{x\n) < 
k, such that 

\og#S - K{x\S,K{S)) >n~k-C2 

for every S 3 x with A'(5') < k — ci. 
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We want to use this result to prove that for every n, there are 
binary strings of length n that have effective complexity of 
about n. Therefore, we have to show that basically all we do 
with ensembles of strings can as well be accomplished with 
equidistributions on sets: 

Lemma 13 (Ensembles and Sets): 
Let X e {0,1}* and (5, A > be arbitrary, and let C be a 
set of decidable and convex constraints such that G C. 
Moreover, let V be an arbitrary set of constraints such that 
the effective complexity £s,a{x\T>) is finite, and let E be the 
corresponding minimizing ensemble. Then, for every e > 0, 
there is a set S* C {0, 1}* containing x with 

log #5* < H{E){l+5)+e (14) 
K{S) < K{E) +C + K{6) + K{e) + K{C) (15) 

such that the equidistribution on S is in C, where c G N is a 
global constant. 

Remark. The most interesting case isV = C, showing that the 
minimizing ensembles in the definition of effective complexity 
can "almost" (up to the additive terms above) be chosen to be 
equidistributions on sets even in the presence of decidable and 
convex constraints. 

Proof: The minimizing ensemble E in the definition of 
£s,a{x\T>) has the following properties: 



K{E) + H{E) < K{x) + A. 



(16) 



We would like to write a computer program that, given a 
description of E, computes a list of all strings y G {0, 1}* 
that satisfy the constraints C and the inequality E{y) > 
2-ff(E)(i+(5) gycjj ^ program could search through all strings 
y, decide for every y whether this equation and the constraints 
hold for y, and do this until the sum of the probabilities of all 
the previously listed elements exceeds 1 — 2~^('^)(^+*). But 
there is a problem of numerics: It is in general impossible for 
the program to decide with certainty if this inequality holds, 
because of the unavoidable numerical error. Instead, we can 
construct a computer program that computes a set S* C {0, 1}* 
with the following weaker properties: 

yeS^ E{y) > 2--f^(E)(i+«)-£ and y satisfies C, 
Hy) > 2"-^('^)(^+*') and y satisfies C ^ y e S. 

That is, the program computes a set S which definitely 
contains all strings y with E{y) > 2--'^ P^) that satisfy the 
constraints, but it may also contain strings which slightly vio- 
late this inequality as long as they still satisfy the constraints. 
However, the numerical methods are chosen good enough such 
that we are guaranteed that every element of S has probability 
of at least 2-ff(E)(i+*)-^ 

This set S contains x and has the desired properties (fl4] i 
and ( fTSl ). This can be seen as follows. By definition, E(a;) > 
2-H{E}(i+S} ^ so a; G S*. Since every element y G S* has prob- 
ability E{y) > 2-^^(E)(i+«)-e^ it holds #S < 2^(E)(i+^)+^ 
Finally, the description length of the corresponding computer 
program can be estimated via K{S\E) < c + K[6) + K{£) + 
K{C). □ 



Now we are ready to prove the existence of effectively com- 
plex strings. First, what should we expect from "effectively 
complex" strings — how large could effective complexity 
^^i5,A(a^) of some string x of length n possibly be? If E is 
the minimizing ensemble in the definition of £s.a{x), then 

£sAx) = K{E) < E(E) < K{x) + A < n + K{n) + 0(1). 

Thus, the best result we can hope for is the existence of strings 
of length n that have effective complexity close to n. The next 
theorem shows exactly this. 

Theorem 14 (Effectively Complex Strings): 
For every (5, A > and n G N, there is a string x of length n 
such that 

£sa{x) > (l-(5)n-(l + 2^)logn-0(loglogn). 

As effective complexity is increased if constraints are added, 
the same statement is true for £s,a{x\C) if C is an arbitrary 
constrained set of ensembles. 
Remark. An explicit lower bound is 

£s,Aix) > (1 - - (1 + 2(5)logn - 21oglogn 
-A(4 + 5) - 5K{d)-uj, 

where a; G N is a global constant. 

Proof Fix A > 0, 5 G (0, 1), and x G {0, 1}". Let E^^ 
be the minimizing ensemble associated to £s,a{x), i.e. 



K{E,) = £sAx)- 



(17) 



Choose e > arbitrary. According to Lemma [131 there is a 

set Sx C {0, 1}* such that x e Sx and 

log #5. < H{Ex)il + 5)+e, (18) 
K{Sx) < K{Ex) + c, (19) 

where c G N the sum of a global constant and K{5) and K{e). 
Let c be the best constant for our universal computer U such 
that 

K{s) < K{s\t) + K{t) + c for all s, t G {0, 1}* 
and at the same time 

K{s\t) < K{s\t, u) + K{u) + c for all s, t, u G {0, 1}*. 

Using (fTST l. (fTsT l and ( fT9] l, we conclude that x is almost typical 
for Sx'- 

K{x\Sx) > K{x)-K{Sx)-c 

> K{Ex)+H{Ex)-A-K{Sx)-d 



> K{Sx)-c + 



\og#Sx 
1 + 5 



> 



~K{Sx) - c 
log#Sx 



1 



c. 



-A 



It also follows 



Kix\Sx,K{Sx)) > K{x]Sx)-K{KiSx))-£ (20) 
log #5, 



> 



1 + 5 
-A -25 



- K{KiSx)) - c - 
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Now we get rid of the term K{K{Sx))- First note that ( fT9] ). 
([nil and O yield 

< K{x) + A + c 

< n + 21ogn + 7 + A + c, 

where 7 G N is some constant such that K{s) < £{s) + 
2\og£{s) + 7 for every s e {0,1}*, and K{k) < logfc + 
2 log log fc + 7 for every fc e N. By elementary analysis, it 
holds log(a + b) < loga + | if a, 6 > 0. Hence there is some 
constant k > which does not depend on n, 5, A, or a;, such 
that for n>2, 

log K{Sx) < logn + c + A + K. 

Using the same argument with K{K{Sx)) < log K{Sx) + 
21oglogi4:(S':j;) + 7, for get for all n > 2 

K{K{Sx)) < logn + 21oglogn + 3c + 3A + 3K + 7. 

Going back to ( |20] l. it follows 

K{x\Sx,K{Sx)) > log ^'^^ - log n - 2 log log n-A, (21) 



where 



1 + S 



A := 4c + 4A + 3k + 7 + £ + 2c 



(22) 



Note that x was arbitrary, so this equation is valid for every 

X e {0,1}". 

Let now X„ max{i^(i) | i G {0, 1}"}, and 

k:^n- [(5(ii:„ + A + e)+logn + 21oglogn] -A-C2, 

where C2 is the constant from Lemma fT2l If n is large enough, 
then < fc < n holds, and Lemma[T2lapplies: There is a string 
X* e {0,1}" such that 

K{x*\S,K{S)) < \og#S-n + k + C2 

for every set S 3 x* with K{S) < fc — ci, where ci is another 
global constant. First note the following inequality: 



5{Kn + A + e) < -SiK{x*) + A + e) 
< -S{H{Ex^)+e) 

e 



< -6[H{Ex.) 



1 + S 



< 



1 + S 

1 



1-S 



{H{Ex>){l + S)+e) 
-lVog#5,.. (23) 



Now suppose that K{Sx') < fc — ci. Consequently, 

K{x*\Sx',K{Sx')) < log^Sx' -n + k + C2 

< log i^Sx* ~ n — A + n — log n 
-S{K.n + A + e) - 2 log log n 
log #5.. 



< 



1 -<5 



log n — 2 log log n — A. 



But this is a contradiction to dTTT l. Hence our assumption must 
be false, and we must instead have K{Sx*) > fc — Ci. Thus, 
using irj) . ( fT9l ). and A'„ < n + 2 log n + 7, 

SsAi^*) - K{Ex>)>KiSx')-c 

> fc — Ci — c 

> n — S{n + 2 log n + 7 + A + e) — log n 
-21oglogri - A - C2 - 1. □ 

Applying relation (fTTT l to the case of effectively complex 
strings x* constructed here, we obtain a lower bound on 

K{x*): 

(1 - S)n - (1 + 2S) \ogn - 2 log log 7i -9 < K{x*), (24) 

where 6 = A(5 + (5) + 5K{S) + lu. For large n, where the 
constant 6 becomes negligible, this bound is non-trivial and 
in particular for 6 = remarkably close to the maximal value 
n + K{n). 

On the other hand, from the previous proof and Lemma [12] 
we deduce the following upper bound on the complexity of 

X*: 

K{x*) < (1 ~ 5)n + K{n) - logn - 2 loglogn - A + a), 

where w is a global constant. This implies the following 
relation between the degree of incompressibility r{x*) and 
the constant A: 

r{x*) > (5n + logn + 21oglogn + A - > A > 4A, 

where the second inequality holds for sufficiently large n and 
the last one uses the definition (i22] i of A. Indeed, this relation 
prevents x* from falling into the domain of Theorem[TO] which 
would force it to have small effective complexity. 

Note that all effectively complex strings must, as long as 
effective complexity is finite, have large Kolmogorov com- 
plexity, too. This follows from dTTT l. 

V. Effective Complexity and Logical Depth 

In this section, we show that effectively complex strings 
have very large computation times. In more detail, it takes a 
universal computer an astronomically large amount of time to 
compute such a string from its minimal program, or from an 
almost minimal program. 

The time it takes to compute a string from its minimal 
program is discussed by C. Bennett H in the context of 
"logical depth". The notion of logical depth formalizes the idea 
that some strings are more difficult to construct than others 
(say, a string describing a proof of the Riemann hypothesis is 
harder to construct than a uniformly random string). However, 
the computation time of a string's minimal program is not 
a very stable measure for this difficulty: There might be 
programs that are "almost minimal", i.e. only a few bits longer, 
but run much faster than the minimal program. 

Thus, logical depth is defined as the shortest time to 
compute a string from its almost-minimal program. "Almost- 
minimal" means that the program is only a few bits longer 
than the minimal program, and the maximum length overhead 
is called the "significance level". 
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Definition 15 (jj^. Depth of Finite Strings): 
Let X £ {0, 1}* be a string and z G No a parameter. A string's 
logical depth at significance level z, denoted Depth^(a;), will 
be defined as 

Deptli^(x) := min{r(p) | l{p) - C{x) < z, V{p) = x}, 

where T{p) denotes the halting time of the universal computer 
V on input p. 

Note that we have used plain Kolmogorov complexity C 
here, that is, the universal computer V is not assumed to 
be prefix-free in contrast to the original definition |4|. As 
plain and prefix Kolmogorov complexity are closely related, 
this modification will not result in large quantitative changes. 
However, for us it has important technical advantages as we 
will see below. 

Logical depth is sometimes defined in a different manner 
with reference to algorithmic probability, cf. fP. Def. 7.7.1]. 
However, if computation times are large (which will be the 
case in our context, cf. the remark after Theorem [TSl l. then 
those different definitions are essentially equivalent [Ij Claim 
7.7.1]. 

Clearly, it takes a computer (i.e. a Turing machine) at least 
£{x) steps to print a string x on its tape. Thus, the depth of a 

string must be lower-bounded by its length: Depth^(a;) > i{x) 
for every z. Following O], strings that have a depth almost 

as small as possible, i.e. Depth^(a;) < £{x), will be called 
shallow. 

The notion of depth depends on the choice of the universal 
reference computer — but not too much. As explained in [l], 
there is a universal 2-tape Turing machine that simulates t 
steps of an arbitrary fc-tape Turing machine in time c • t log t, 
where c is a constant that only depends on the machine. In 
particular, this Turing machine can implement obvious tasks 
like copying of n bits in time n from one tape to the other. 
Therefore, we will fix this "Hennie-Sterns machine" (cf. [T 
6.13]) as our universal reference machine for this section. 

To state the first example, a string x will be called m- 
random if C{x) > ll{x) — m. 

Example 16 (/T, Ex. 7.7.3]): Random strings are shallow. 
That is, there are constants /3, 7 £ N such that for every m- 
random string x it holds 

Depth„+^(a;) < ^(a;) + 7. 

As Depth^ is decreasing in z, it also follows that 
Depth2(a;) < l{x) + 7 for all z > m + /3. 

Proof: There is always a computer program p of length 
^{x)+P that sequentially lists the bits of x, producing V{p) = 
X in the most trivial way. Such a program will have a running 
time of l{x), plus potentially some overhead 7 resulting from 
initialization. If l{x) < C{x) + m, then £{p) < C{x) + m + (3, 
and the claim follows from the definition of depth. □ 

It will be interesting in the following that this conclusion 
carries over to "most" strings that are r-incompressible as 
defined in Theorem [TO] This will be proved in the next 
example. Moreover, we give a technical result which will be 
useful below. Note that 

K{x)<C{x)+K{C{x)) (xe{0,l}*), 



and we will be interested in strings for which some kind of 
converse holds. For this purpose, we say that a string x is 
k-well-behaved for some fc G N if 

K{x) + k> C{x)+K{C{x)). 

In fact, it is stated in 1 1] that most strings are fc- well-behaved if 
k is large enough. The next example shows that incompressible 
random strings are well-behaved. 

Lemma 17 (Incompressible and Shallow Strings): For ev- 
ery n, there are at least 2" (1 — 2^''+^ — 2^"') strings of length 
n that are r-incompressible and m-random, where c G N is a 
constant. They satisfy 

Depth„+^(a;) < ^(a;) + 7, 

where /3, 7 e N are constants. Moreover, those strings are k- 
well-behaved, where fc e N is a constant that only depends on 
r and rn, but not on x. 

Proof: Recall two basic incompressibility facts that are 
listed in [ 1 1: 

« There is a constant c such that there are at least 2"(1 — 
2-r+c-j sj-fjjjgs of length n which are r-incompressible. 

• For every to e N, there are at least 2"(1 — 2^™) strings 
of length n with C{x) > n — m (we call such strings 
"TO-random"). 

A simple application of the union bound gives that there are 
at least 2"(1 - 2^"^+" - 2""') strings of length n that are at the 
same time r-incompressible and m-random. The upper bound 
on the logical depth follows then from Example [16] 

If X is m-random, then C{x) > £{x) — to, and a converse 
inequality holds trivially. The well-known continuity prop- 
erty [l] of K then guarantees the existence of a constant 
I e N (that depends only on m but not on x) such that 
K{C{x)) < K{l{x))+l. Moreover, if p e N is a constant such 
that C(s) < £{s) +p for all strings s, the r-incompressibility 
property yields 

K{x) > e{x)+K{£{x))-r 

> C{x)-p + K{C{x))-l-r. 

It follows that a; is [p + I + r)-well-behaved. □ 

We will now show that effectively complex strings must 
have very large logical depth if they are well-behaved. This 
is in contrast to the fact that incompressible strings (which 
have small effective complexity according to Theorem [TOli are 
always shallow, that is, have very small depth as shown in 
Lemma [17] 

The main idea is as follows: Suppose that an almost- 
minimal program for the string x has a rather short halting 
time T. Then we can consider the ensemble E, that is defined 
by equidistribution over all strings that have a short program 
(of length less or equal to C{x)) with some halting time less 
than or equal to r. Such an ensemble is simple, x is typical 
for it, and it has small total information. Thus, it forces the 
effective complexity of x to be small. 

Theorem 18 (Effective Complexity and Depth): There is a 
global constant w e N with the following property: Suppose 
that / : N ^ N is a strictly increasing, computable function. 
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Furthermore, suppose that x is a fc-well-behaved string. If the 
effective complexity of x satisfies 

£5,k+z+K(z)+Kif)+uj+i{x) > K{C{x)) + K{z) + K{f) + u 

for some arbitrary (5 > and z G N, then 

Depth,(a;) > f{C{x)). 

Remark. Before we prove this theorem, we explain its mean- 
ing and implications. First, since C{x) < £{x), it follows that 
K{C{x)) is of the order \og£{x) or less, which is quite small. 
Thus, the inequality for £ in this theorem is a very weak 
condition. 

The function / can be chosen to be simple (such that K{f) 
is small), but extremely rapidly growing, for example of the 
form 

f{n) :— n" (power tower of height n). 

Thus, the consequence of this theorem is that the depth must 
be astronomically large. 

Proof: As £5 a is decreasing in S, it is sufficient to 
prove the theorem for the case S — 0. For y G N and every 
computable function / : N ^ N, define the set 

Tyj {x e {0, 1}* I 3pe {0,1}* with =a;, 

i{p)<y,T{p)<f{y)}, 

where T{p) denotes the halting time of the (plain, not prefix) 
universal computer V on input p. Clearly ifryj < 2^+^, and if 
Ey j is the uniform distribution on r.yj, then H{Eyj) < y+1. 
Moreover, there is some constant c G N such that K{Eyj) < 
K{y) + K{f) + c. Hence the total information satisfies 

^iEyj)<y + K{y) + K{f) + c+l. (25) 

Now let X G {0, 1}* be a fc-well behaved string, and let z G N 
be arbitrary. Suppose that x G tc(x)+2./' then x is 0-typical 
for E := Ec{x)+zj- Since there exists some global constant 
d eN such that 

K{a + b) < K{a) + K{h) + d for every a, 6 G N, 

we can estimate, using ( |25] ), 

S(E) < C{x)+z + K{C{x) + z) + K{,f) + c+l 

< C(x) + K{C{x)) +Z + K{z) + K{f) + d + c+l 

< K{x) + k + z + K{z) + K{f) + c + d+l. 

= :A 

By definition of effective complexity, we get 

£o,Aix) < K{E)<K{C{x)+z) + K{f)+c 

< K{C{x))+K{z) + K{f)+c + d. (26) 

In summary, we have so far shown the following: if there is 
a program p G {0,1}* with V{p) = c and £{p) < C{x) + 
z such that the corresponding halting time satisfies T{p) < 
f{C{x) + z), then the effective complexity is as small as in 
(|26T l. The negation of this statement, together with f{C{x) + 
z) > f{C{x)) and uj :— c + d, proves the theorem. □ 

Interestingly, incompressible strings just slightly fail to 
fulfill the inequality of the previous theorem. According to 



Corollary (TT] r-incompressible strings x have effective com- 
plexity £s^r+c{x) < K{C{x)) + r. Thus, we cannot conclude 
that those strings have large depth — fortunately, because 
most r-incompressible strings are in fact shallow according 
to Example [Tt] 

Thus, it follows that the expression K{C{x)) sharply marks 
the "edge of depth", in the sense that strings with larger 
effective complexity always have extremely large depth, but 
strings with smaller effective complexity can have arbitrarily 
small depth. In some sense, a phenomenon similar to a "phase 
transition" occurs at effective complexity of K{C{x)) (apart 
from additive constants that get less and less important in the 
"thermodynamic limit" l{x) 00). 

This behavior is schematically depicted in Figure ID The 
previous theorem says that if the effective complexity exceeds 
K{C{x)) (omitting all additive constants here), then the logi- 
cal depth must be astronomically large. On the other hand, if 
effective complexity is smaller, different values of depth seem 
possible. In particular, if x is r-incompressible, then we know 
from Corollary [TT] that effective complexity is (possibly only 
up to a few bits) smaller than K{C{x)), while the logical depth 
is as small as possible (of the order £{x)) due to Lemma (TT] 

Depth(a;) 



absurdly 
large " " 



?? 




n ^ e{x) -L^l 1 *^£{x) 

K(C{x)) n = i{x) 

Fig. 2. At effective complexity equal to K{C(x)) logical depth suddenly 
becomes astronomically large. This is reminiscent of the phenomenon of 
"phase transition" known from statistical mechanics. 

This theorem can easily be extended to the case of effective 
complexity with constraints as long as the constrained sets of 
ensembles satisfy the usual technical conditions: 

Theorem 19 (£ and Depth with Constraints): There is a 
global constant G N with the following property: Suppose 
/ : N ^ N is a strictly increasing, computable function. 
Furthermore, suppose that a; is a fc-well behaved string, and 
the constrained set C is decidable and convex and contains the 
Dirac measure S^- If the effective complexity of x satisfies 

£s,Ai^\C) > K{C{xj) + K{z) + K{f) + K{C) + lu 

for some A > fc + z + K{z) + K{f) + u + I + K{C) with 
S >0 and z G N, then 

Depth, (a:) > fiC{x)). 

Remark. For an explanation and interpretation of this theorem, 
see the remarks after Theorem [TS] above. 
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Proof: The proof is almost identical to that of TheoremfTS] 
above. The only modification is that the set Tyj has to be 
replaced by a set 

Tyjfi ■■= {xe{0,l}* I 3pe{0,l}*withT/(p) = a;, 
^(p) <2/,T(p)</(y), 5, satisfies C}. 

The convexity condition then ensures that the uniform dis- 
called Myjc, is contained in C. This 



tribution on t. 



construction enlarges every additive constant by a term K(C), 
i.e. the complexity of a computer program that is able to test 
for strings if the corresponding Dirac measures are contained 
in C. □ 

VI. Effective complexity and Kolmogorov 
Minimal Sufficient Statistics 

Now we study the relation between effective complexity 
without constraints and Kolmogorov minimal sufficient statis- 
tics (KMSS). For more information on Kolmogorov minimal 
sufficient statistics and related notions, see |1, 2.2.2]. 

For strings x and integers /c G N, we can define a (version 
of) the Kolmogorov structure function Hk{x\n) by 

Hk{x\n) := min{log#A \AC{0, 1}", x e A, 

K,{A\n)<k}, (27) 

i.e. Hk{x\n) is the logarithm of the minimal size of any subset 
of strings of length n which contains the string x and has 
complexity upper bounded by k, given n. The corresponding 
minimal set will be called Ak (if there are several minimiz- 
ers, we take the first set in some canonical order). Hence 
Hk{x\n) = log#Afc and K^Akln) < k. 

Definition 20 (KMSS): 
Let a; be a string of length n and denote by /ca {x) the minimal 
natural number k satisfying 



Hkix\n) + k < K^{x\n) + A. 



(28) 



A minimal program k'^{x) for A^-^i^^-j is called Kolmogorov 
minimal sufficient statistics of the string x. 
Originally, the Kolmogorov structure function as well as Kol- 
mogorov minimal sufficient statistics were defined by using 
plain Kolmogorov conditional complexity C(-| ) in ( l27T i and 
instead of Chaitin's prefix version Kf:{-\-). It holds 



uniformly in x £ {0, 1}*, where n :— £{x). Moreover, there 
is a constant .g £ N such that for all 6,A>0 

SsAi^) > kK(n}+A+S{K(x)+A)+K(S)+g{x) - K{5) (32) 



{x) 



K{n)+A+S(K(x)+\)+K(S)+g 

-K{6) ~ K{n) 

uniformly in x G {0,1}*. 

Proof Let k :— k^ix) and n := £{x). By definition, 

k > K^{Ak\n) ^ K{Ak,n) - K{n) ^ K{Ak) - K{n). 

Let Efc be the uniform distribution on A^- It follows 

H(Ek)+KiEk) ± log^Ak + K{Ak) 
= Hk{x\n)+K{Ak) 

< Hk{x\n) + k + K{n) 

< K^{x\n) + A + K{n) 



= K{x,n) 



A 



l{kl{x)) = < kA{x) + K{n). (29) 

Moreover, fcA (x) can equivalently be written as 

kA{x) = mm {K4A\n) \ \og# A + K4A\n) < K^xln) 

+A, xeAc {0,1}"}. (30) 

This formula looks very similar to the definition of (uncon- 
strained) effective complexity, as given in Definition ID Hence 
the Kolmogorov minimal sufficient statistics is approximately 
the minimal program of the minimizing set within the mini- 
mization domain of effective complexity. We will explore this 
observation in more detail in the following lemma. 

Lemma 21: There is a constant c G N such that for all 
5, A > it holds 



£sA+c[x) < Kklix)) < kAix) + K{n) 



(31) 



= K{x) + A. 
Thus, there is some constant c G N such that 

£5A+c{x) < i^(IEfe) < K{Au). 

Then (EB follows from 

In order to show ( |32] |. we use Lemma [13] Let E be the 
minimizing ensemble in the definition of £s.a{x)- In particular, 
it holds 

if (E) = EsAi^)- (33) 

Fix any e > 0. Due to Lemma [13] there is a set S' C {0, 1}* 
containing x such that 

log#y < H{K){l+5), 
K{S') < K{E)+K{5). 

Let now S S"n{0, 1}". It still holds log #5* < H{K){l+5) 
and K{S) < K{S') + K{n) < K{¥) + K{5) + K{n). Since 
K^{S\n) = K{S,n) - K{n) ± K{S) - K{n), we get the 
chain of inequalities 

log#S' + A',(S'|n) < H{¥?,{l+5)^^K{S)~K{n) 

< H{E) + 6H{E) + K{E) + K{6) 

< K{x) + A + 6H{E)+K{6) 

< K,{x\n)+K{n) + A + K{5) 
+5{K{x) + A). 

Using ([30]l and ([33]l, it follows that 

kK(n)+A+5(K(x)+A)+K(S)+g{x) < K^{S\n) 

± K{S) - K{n) 

< K{E)+K{5) 
= £sa{x)+K{5). 

Then ( |32] | follows again from (i29] l □ 
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VII. Conclusions 

We have given a formal definition of effective complexity 
and rigorous proofs of its basic properties. In particular, 
we have shown that there is an interesting relation between 
effective complexity and logical depth: the depth of a string 
X is astronomically large if the effective complexity exceeds 
K{C{x)); otherwise, it can be arbitrarily small. 

This statement is true up to a few technical conditions 
and up to certain additive constants. These constants become 
less and less important for longer and longer strings — this 
is comparable to the "thermodynamic limit" in statistical 
mechanics, and the behavior can be compared to that of a 
phase transition. 

So how useful is effective complexity for the study of natu- 
ral systems? We do not yet know the answer to this question, 
but we hope that our mathematically rigorous approach gives 
the first steps towards possible applications within mathemat- 
ics or theoretical computer science. 

At least, we have shown that effective complexity has 
interesting properties, for example there are strings that have 
effective complexity close to their lengths. Those strings are 
rare events only — "most" strings are in fact effectively 
simple; this follows from Theorem [10] But this property is 
unavoidable: Recall that a major motivation for the definition 
of effective complexity was that random strings should be 
simple (in contrast to Kolmogorov complexity). Now since 
almost all strings are random, it follows that almost all strings 
must be effectively simple. 

Most of our results concerning the constrained version 
of effective complexity £s.a{x\C) were derived under the 
assumption that the Dirac measure 5^ is an element of C. 
Although this is a natural assumption, the behavior of effective 
complexity might as well be completely different if it is 
dropped. Investigating such situations in more detail could be 
useful in order to get better insight into the concept of effective 
complexity and its limitations. 

Finally, a possible field of application of effective complex- 
ity might be statistical mechanics, where the notion of entropy 
and algorithmic complexity have both already led to interesting 
conclusions. After all, the constraints given by the set C can 
be interpreted as macroscopic observables as discussed in 
Section|III] and we have already compared our result on logical 
depth with certain notions of statistical mechanics. 

Appendix A 
An Example of Non-Computable Entropy 

As promised in the introduction, we give an explicit con- 
struction of a computable ensemble E with the property that 
the entropy i/(E) is finite but not computable: 

Example 22 (Non-Computable Entropy): 
For every n G N, let An be the set of strings that start with 
exactly n — 1 zeroes, such that the n-th bit either does not 
exists or is a one. That is, 

Ai = {A, 1,10, 11, 100, 101,...}, 

A2 = {0,01,010,011,0100,0101,...}, 

A3 = {00,001,0010,0011,00100,00101,...} 



and so on. Clearly, this is a computable partition of {0,1}* 
into countably-infinite, mutually disjoint subsets: UneN ~ 
{0,1}*, A„,nAn =0if m/n. 

We now construct an ensemble E such that every set An 
has weight 2"", that is E(A„) := ExeA„ ^i^) = 2""- We 
distribute the weight 2^" among the members of An in a way, 
such that the resulting ensemble E is computable, but has non- 
computable entropy. This is done as follows: let f2 > be a 
real number which is not computable, but enumerable from 
below. That is, there exists a computable sequence (ri„)„eN 
with f^i := which is increasing, i.e. f2„+i > f2„, and which 
converges to fl, i.e. lim„_,oo = ^- For example, we may 
use Chaitin's Omega numberlH] 

u{x) exists 

where the sum is over all strings x 6 {0,1}* such that the 
universal prefix computer U halts on input x. The number 
gives the probability that the computer U halts on randomly 
chosen input. It is a real number between zero and one, and it 
is obviously enumerable from below, but it is not computable. 

Given some weight c G (0, 1] and finitely many positive real 
numbers r,; such that J^i^i = '^he resulting entropy sum 
— J2i log^'i will always be larger than or equal to — clogc. 
The converse is also true: Given some fixed entropy value 
s > —clogc, we can always find finitely many positive real 
numbers with J^i ft = c such that — J^i logr^ = s. Such 
a list of real numbers can be found in an obvious systematic 
way that can be implemented as a computer program. 

Thus, to every n G N, we may systematically distribute the 
weight 2^" to finitely many strings in {xi, . . . ,Xk} C An 
such that the corresponding probabilities E(xi) have entropy 
sum —2^" log 2^" + fln+i ~ i^n, i.e. 

- V E(a;) log E{x) = -2"" log 2"" + Q„,+i - 17„ . 



The resulting ensemble is obviously computable, and the 
entropy is 

00 

H{E) = E(a;)logE(x) 

n— 1 xeAn 
00 

= E(-2""log2"" + ^"+i-^") 

00 
n=l 

= 2 + n. 

This is not a computable number 
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